Rank | Count | Beginning |
---|---|---|
7167 | 2375 | El |
15610 | 1674 | La |
9787 | 1359 | En |
18271 | 787 | Los |
22926 | 389 | Por |
17173 | 384 | Las |
11525 | 374 | Es |
21835 | 368 | Para |
25158 | 360 | Se |
5275 | 353 | De |
20531 | 344 | No |
238 | 324 | A |
25958 | 323 | Si |
29469 | 311 | Y |
12118 | 288 | Esta |
12593 | 270 | Este |
22407 | 242 | Pero |
4154 | 233 | Con |
1068 | 231 | Al |
27274 | 227 | También |
28402 | 213 | Un |
28406 | 199 | Una |
26250 | 195 | Sin |
3809 | 175 | Como |
6630 | 163 | Dominicana |
18081 | 155 | Lo |
26955 | 153 | Su |
25344 | 151 | Según |
4905 | 136 | Cuando |
12891 | 129 | Esto |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV